SENTIMENT ANALYSIS AND TEXT MINING CAPSTONE PROJECT

Loading and analyzing Amazon Mobile Reviews Dataset

Dataset Visualization

Wordclouds

We have used the Wordclouds package to get a quick overview of most recurrent words in the text corpus

Text Normalization

1. Word Features

2. Tokenization

Testing the function with a random review

3. Stemming

Importing the different libraries and modules used for stemming

Creating a stem_tokens function that takes the list of tokens as input and returns a list of stemmed tokens

Testing out our stemmer function

4. Lemmatization

Combining all Text normalization functions together

Text Representation

1. Processing Reviews

Term Frequency – Inverse Document Frequency (TF-IDF)

Create a fit_tfidf function used to build the TF-IDF vectorizer with the corpus tokens (X)

Sentiment Model

Helper function

This function will be used to plot the confusion matrix for the different models we will create

1. Train/Test Split

Logistic Regression

1. Model

1.1. TF-IDF

2. Performance Metrics

2.1 TF-IDF

Pipeline

Create a predict_review function used to pre-process, transform and predict review sentiment